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A small fast neutron (FN) mutant population has been established from Phaseolus 
vulgaris cv. Red Hawk. We leveraged the available P. vulgaris genome sequence and 
high throughput next generation DNA sequencing to examine the genomic structure 
of five P vulgaris cv. Red Hawk FN mutants with striking visual phenotypes. Analysis 
of these genomes identified three classes of structural variation (SV); between cultivar 
variation, natural variation within the FN mutant population, and FN induced mutagenesis. 
Our analyses focused on the latter two classes. We identified 23 large deletions 
(>40bp) common to multiple individuals, illustrating residual heterogeneity and regions 
of SV within the common bean cv. Red Hawk. An additional 18 large deletions were 
identified in individual mutant plants. These deletions, ranging in size from 40 bp to 
43,000 bp, are potentially the result of FN mutagenesis. Six of the 18 deletions lie near 
or within gene coding regions, identifying potential candidate genes causing the mutant 
phenotype. 

Keywords: Phaseolus vulgaris, common bean, natural variation, structural variation, fast neutron mutation, 
DNA-Seq 



INTRODUCTION 

Common bean, Phaseolus vulgaris L., is an important source of 
proteins and carbohydrates for over three million people world- 
wide (Broughton et al., 2003). Despite its dietary importance, 
genetic resources for common bean have lagged behind those 
of "model legumes" soybean, Medicago truncatula, and Lotus 
japonicus. However, next generation sequencing (NGS) technolo- 
gies now make genomic studies applicable to any species of 
interest. 

Mutants are important tools in deciphering gene functions. 
In common bean, individual mutants can be created for a gene 
of interest through plant transformation (Aragao et al., 1996; 
Kwapata et al, 2012). Gene expression patterns of various genes 
in common bean can also be knocked out or down through the 
use of virus induced gene silencing (Diaz-Camino et al, 2011; 
Zhang et al, 2013). These methods both require prior knowl- 
edge of genes of interest. In contrast, mutant populations can be 
screened for phenotypes of interest and genes responsible identi- 
fied through various approaches. The analysis of traits in various 
species including those related to plant architecture, yield, and 
stress response genes have been improved by utilizing mutant 
screens (Papdi et al, 2010; Bolon et al, 201 1; Ma et al., 2013). 



Structural variation (SV), including presence absence vari- 
ation, inner and intra chromosomal translocations, insertions, 
and deletions is believed to be an important component of phe- 
notypic diversity in both plants and animals (Lai et al, 2010; 
Stankiewicz and Lupski, 2010; Cao et al, 2011; Eichten et al., 
201 1; Wang et al., 201 1; McHale et al, 2012). Genomic variation 
both between and within cultivars has been identified in soybean 
(Bolon et al., 201 1; Haun et al, 201 1; McHale et al, 2012), maize 
(Lai et al, 2010), rice (Huang et al., 2012), and Arabidopsis (Cao 
et al, 2011; Belfield et al., 2012). Two studies in soybean exam- 
ining SV between cultivars (McHale et al, 2012) and within the 
Williams 82 cultivar (Haun et al., 2011) determined the genomic 
regions most enriched for SV were gene rich regions, particularly 
regions containing resistance genes. 

Here, we present a small fast neutron (FN) mutant popula- 
tion for common bean and demonstrate how NGS technologies, 
such as DNA-seq provide for fast, high quality analysis of genomic 
variation to identify potential candidate genes for observed phe- 
notypes. Additionally, the DNA-seq data allowed us to examine 
the natural variation existing within the Red Hawk cultivar. Such 
natural variation is a rich source of genomic diversity that can be 
utilized in future cultivar development. 
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MATERIALS AND METHODS 

DEVELOPMENT OF Phaseolus vulgaris FAST NEUTRON POPULATION 

Ten thousand P. vulgaris cv. Red Hawk seeds (Kelly et al, 1998), an 
Andean cultivar adapted for growth in the upper Midwest, were 
sent to the McClellan Nuclear Radiation Center at the University 
of California-Davis for irradiation. Five thousand seeds were 
treated with either 16 or 32 Gys of FN radiation. Treated seeds 
were sent to the Illinois Crop Improvement Association (ICIA) 
facility in Puerto Rico in November 2009 along with 200 wild 
type seeds from the same seed lot. Approximately 70% of the 5000 
seeds treated with 16 Gys germinated, while none of the seeds 
treated with 32 Gys germinated. Seedlings were allowed to mature 
at the ICIA facility, where plants were phenotyped and seeds were 
collected from all mature plants. Seeds from ~88 plants with 
striking mutant phenotypes such as developmental delays, plant 
stature, pod set, and pod size variations, were harvested individ- 
ually. Remaining mutant plants were bulk harvested. Wild type 
plants grown at ICIA were also bulk harvested. 

Seeds from individually collected plants were planted at the 
University of Minnesota Experiment Station in St. Paul in 2010. 
Approximately 10,000 seeds from the bulk collection of mutants 
were also planted. Phenotyping was performed throughout the 



growth season, complemented by photographs. Selected indi- 
viduals with visible and/or maturity phenotype variations were 
harvested. In 2011, seeds from selected 2010 M2 individuals 
were planted in 10 ft rows (~20 seeds). Phenotypes observed 
throughout the 2011 growth season were compared to docu- 
mented phenotypes from previous years to determine if trait 
expression was consistent. Additionally, segregation among the 
20 plants per mutant line was noted. Three to four individuals 
in each row with visible/stable traits were tagged, photographed 
and seed was harvested. 

DNA-SEQ ANALYSIS OF FAST NEUTRON MUTANTS 

Five FN mutant plants with different, stable, obvious pheno- 
types (Figure 1) were chosen for paired end sequence analysis. 
The following FN mutant plants chosen: lR5C01r5CPVMNll, a 
plant with decorative chlorotic leaves early in the growing season 
(Figure 1A), lR19C15r28CPVMNll, a small plant with lanceo- 
late leaves (Figure IB); lR22C04r31CPVMNll, an upright plant 
with rugose leaves (Figure 1C); 2R29C12r78CPVMNll; which 
phenotypically resembled the wild type plant but was delayed in 
maturity (Figure ID); and 3R5C25r87CPVMNll, which exhib- 
ited interveinal chlorosis (Figure IE). The mutant plants will 




FIGURE 1 | Visual phenotype of five Phaseolus vulgaris cv. Red 
Hawk mutants from the fast neutron mutant population chosen for 
DNA-seq. All plants are from the M2 generation and were grown at 
the University of Minnesota Experiment Station in St. Paul, MN in 2011. 
(A) Mutant 1 R5C01 r5CPVMN11 referred to as decorative due to the 
chlorotic patterning on leaves early in the growing season. (B) Mutant 
1R19C15r28CPVMN11 referred to as lanceolate due to the elongated 
shape of the leaf. This mutant also appeared shorter than the WT 



plants in the field. (C) Mutant 1 R22C04r31 CPVMN11 referred to as 
rugose due to the crinkled leaf texture. The petioles of this plant 
also appeared shorter and more upright than the WT (D) Mutant 
2R29C12r78CPVMN11 referred to as the maturity mutant. This plant is 
phenotypically identical to the WT except for a delay in maturity. (E) 
Mutant 3R5C25r87CPVMN11 referred to as the chlorotic mutant due to 
the interveinal chlorosis patterning observed in the leaves. (F) Wild-type 
Phaseolus vulgaris cv. Red Hawk for comparison. 
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respectively be referred to as lanceolate, rugose, decorative, 
maturity, and chlorotic throughout the rest of the manuscript. 
M3 seeds of the plants chosen for sequencing were collected and 
planted at the University of Minnesota Experiment Station in St. 
Paul in 2012 to ensure the phenotype was maintained through 
the M3 generation. Leaf tissue from a representative wild type 
plant and from each of the chosen mutant plants at the M2 
generation was collected from 2011 field-grown plants early in 
the morning and immediately placed at — 80° C to inhibit DNA 
degradation. DNA from all six plants (WT and five mutants) 
was extracted using the phenobchloroform method as described 
(Liu et al, 1997). Each DNA sample was visually inspected on a 
1% agarose gel, to ensure that the samples were not degraded. 
DNA concentration and purity was assessed using an Agilent 
2100® Bioanalyzer™ (Agilent®, Santa Clara, CA). DNA samples 
were submitted to the molecular biology core at the Mayo Clinic, 
Rochester, MN for paired end sequencing on an Illumina HiSeq 
2000. To reduce variability, DNA from all samples were multi- 
plexed and run in a single lane. Low quality reads and adaptor 
sequences were removed, resulting in 31 million paired end reads 
per sample. 

Paired-end genome sequences were mapped to the P. vulgaris 
G 19833 genome sequence available at www.phytozome.net using 
BWA (Li and Durbin, 2009) with default parameters. The result- 
ing mapping files were further sorted, indexed, and translated 
to binary format (BAM files) using samtools (Li et al, 2009). 
The sequence alignments were visualized using IGV (Robinson 
et al, 2011). This approach aligned 70% of all Red Hawk DNA 
sequences to 88% of each of the IIP. vulgaris chromosomes with 
12X sequence depth. Regions of genomic deletions were identified 
using custom perl scripts. To confirm these deletions, the pro- 
gram, CREST (Wang et al., 2011) was also used to screen the FN 
mutant plants. This program identifies genomic deletions, inser- 
tions, inversions, and translocations by identifying soft clipped 
reads and the read coverage at potential breakpoints to calcu- 
late if the probability of observing the number of soft clipped 
reads at a given location is >0.05 based on a binomial distri- 
bution. Statistically significant (P < 0.05) SV are retained for 
further consideration. CREST analysis was performed for each 
mutant compared to the wild type control. Genomic deletions 
resulting from differences between cultivars were masked using 
the -g function. We chose to focus our analysis on genomic dele- 
tions as these are called with the greatest confidence and can be 
confirmed by visual screening of genomic alignments. Deletions 
>40bp identified by both perl scripts and CREST analysis were 
further characterized by genie location: intergenic, promoter, 
exon, intron, and 3'UTR. Genomic regions of natural variance 
within the cultivar were identified by identifying deletions com- 
mon to multiple, but not all, FN mutant plants (Figures 2A, 3). 
Single nucleotide polymorphisms (SNPs), small (<40bp) inser- 
tions and deletions (INDELs) (either unique to a single plant 
or common to multiple plants) were identified using the pileup 
function in SAMtools (Li et al, 2009). Unique or common SNPs 
and INDELs were identified using the compareBed function of 
BEDtools (Quinlan and Hall, 2010) and custom perl scripts. SNPs 
and INDELs with a read depth <6 (half the average genome cov- 
erage) were removed from further analysis. Custom perl scripts 



were used to characterize SNPs and INDELs by genie locations as 
described above. 

RESULTS 

MUTANT COLLECTION 

FN mutant populations have proven to be a valuable asset for 
genetic studies in a variety of crop species (Starker et al., 2006; 
Bolon et al, 2011; Xiao et al, 2011). We have established a 
small FN mutant population for common bean using P. vul- 
garis cv. Red Hawk. Seeds from 88 plants with stable, visual 
phenotypes are available for public use (Table 1) by contact- 
ing Dr. Carroll Vance at vance004@umn.edu or Jeff Roesler at 
roess001@umn.edu. Phenotypes observed in the field include 
varying degrees and types of chlorosis and altered maturity (usu- 
ally delayed). Additionally, bulk seed from M2 and M3 plants is 
available for researchers wishing to screen mutants for a particular 
trait of interest. 

USE OF DNA-SEQ TO IDENTIFY REGIONS OF STRUCTURAL VARIATION 

The DNA-seq reads from the five FN mutant lines and one wild- 
type cv. Red Hawk individual were mapped to the common bean 
reference genome sequence (accession G 19833). This analysis 
allowed us to identify three classes of SV (Figure 2): (1) sequences 
present in the reference genome sequence, but absent from all six 
of the Red Hawk individuals (Figure 2B); (2) sequences present in 
both the reference genome sequence and at least one Red Hawk 
individual, but absent in two or more Red Hawk individuals 
(Figure 2A); (3) sequences absent from one FN line, but present 
in all other samples (Figure 2C). 

The Class 1 group primarily represents intra-specific struc- 
tural genomic differences between accession G 19833 and cv. Red 
Hawk. We identified over one-thousand of these regions, in which 
all six Red Hawk individuals were missing > 100 bp that is present 
in the reference genome. This is an interesting group of SV to 
catalog and may have important implications for understand- 
ing inter- cultivar phenotypic variation. However, these features 
do not inform our understanding of the mutant phenotypes 
in the FN fines. Therefore, we chose to focus the data analysis 
on the Class 2 and Class 3 groups, which exhibited structural 
polymorphism among the FN individuals. 

The Class 2 group consists of DNA segments present in some 
FN individuals, but missing in at least two others (Figure 2A). 
It is highly unlikely the same genomic regions would be deleted 
in multiple plants by FN irradiation. Our analysis identified 
24 genomic deletions belonging to Class 2, which illustrates 
the natural variation within the common bean cultivar Red 
Hawk. Specifically, ten unique deletions (P < 0.05) are shared 
by the lanceolate and chlorotic mutants on chromosome 1, nine 
genomic deletions were identified on chromosomes 2 and 11 in 
three mutant plants (lanceolate, maturity, and chlorotic), four 
deletions on chromosome 4 are common to the maturity and 
chlorotic mutants, and one deletion on chromosome 1 1 is shared 
by the rugose and maturity mutants (Figure 3, Table 2). Class 2 
deletions range in size from 41 base pairs (bp) to 12,1 1 1 bp. Of all 
of these deletions, two are within gene introns and two are within 
1000 bp upstream of the start codon or 1000 bp downstream of 
the stop codon, regions involved in regulating gene expression 
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FIGURE 2 I Example of sequence alignments illustrating cultivar 
heterogeneity and fast neutron induced deletions. Paired end sequences 
from six Red Hawk individuals were aligned to the G 19833 genome 
sequence available at www.phytozome.net using BWA (Li and Durbin, 2009) 
with default parameters. Using CREST and custom perl scripts, we identified 
statistically significant deletions illustrating three classes of SV. SV was 
visualized using IGV (Robinson etal., 2011). These regions of SVwere all 
identified on chromosome 1, from the two regions highlighted by filled black 
boxes above the alignments. The double vertical line within the alignments 
represents chromosomal region between the two genomic regions depicted. 
For each individual, a histogram plot illustrates the read depth with individual 
reads plotted below. Black boxes highlight the three classes of SV. (A) Class 
2, regions of genomic heterogeneity within the Red Hawk cultivar. These 



deletions were identified in two or more Red Hawk individuals and represent 
the residual heterogeneity in the fast neutron mutant population. This 
deletion spans 3408 bp on chromosome 1 and is only evident in the 
lanceolate and chlorotic individuals. (B) Class 1, sequences present in the 
reference genome, but missing in all Red Hawk individuals. These regions 
illustrate the differences between the common bean cultivars. This particular 
region spans 5000 bp of the reference genome. (C) Class 3, sequences 
missing in a single individual but present in all other lines. This class is most 
likely the result of fast neutron mutagenesis and may be responsible for the 
mutant phenotype. This particular deletion is approximately 1500 bp long in 
the chlorotic mutant and is immediately downstream of the predicted gene 
Phvul.001 G128600 (shown in blue below alignments), which encodes a RecA 
protein. 



patterns (Table 2). All Class 2 deletions on chromosome 2 lie 
within 2.7 million bps of each other. This is a region exhibiting 
high heterogeneity (Figure 3). Within this region three individual 
FN lines share eight Class 2 deletions. The region is unaffected in 
the remaining two FN lines and the wild type plant. 

The remaining 18 deletions identified by CREST (P < 0.05) 
belong to Class 3. This class is composed of sequences absent 
from a single FN line, but present in all other samples (Figure 2C, 
Table 3). Deletions belonging to Class 3 range in size from 
41 to over 43,000 bp and are found on chromosomes 1, 4, 5, 
7, 8, and 9. Twelve of the eighteen Class 3 deletions are in 
intergenic regions of the genome, though as genome annota- 
tion improves these regions may contain as of yet unpredicted 
genes. Six deletions belonging to Class 3 have the potential to 
alter gene expression patterns. Regions immediately upstream 
or downstream of gene coding regions are likely involved in 
regulating gene expression. Three deletions are located within 
1000 bp of gene start or stop codons of Phvul.001G128600, 



Phvul.004G029200, Phvul.004G031900, and Phvul.004G032000 
in the chlorotic and maturity mutants. The latter two genes are 
tightly linked in the genome, so a single deletion may affect 
the expression of multiple genes. Two Class 3 deletions shorten 
the introns of Phvul.001G050100 in the lanceolate mutant and 
Phvul.004G031200 in the maturity mutant. Finally, a 43,034bp 
deletion on chromosome 8 in the chlorotic mutant removes the 
entire sequence for Phvul.008G141500. In summary, we were able 
to identify statistically significant SV belonging to Class 3 within 
either the coding region or the potential regulatory region of 
predicted genes for three of the five mutant plants (lanceolate, 
maturity, and chlorotic). Genes likely impacted by these deletions 
represent the most likely candidates responsible for the mutant 
phenotype of the lanceolate, maturity, and chlorotic mutants. 
For decorative and rugose, our analysis pipeline failed to identify 
Class 3 deletions within the regulatory or coding region of genes. 
These phenotypes may be a result of a heterozygous deletion, a 
small (<40bp) INDEL, or a SNP. 
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FIGURE 3 | CREST analyses identifies SV belonging to Class 2 and 
Class 3. The number of large genomic deletions in each class identified per 
500,000 bp on each chromosome of P vulgaris were counted and plotted as a 
vertical line. The height of the line indicates the number of SV per 500,000 bp 
region that were identified. The largest number of SV per chromosomal 
region is noted to the left of each chromosome. The mutant containing the 



deletion is represented by color: blue, chlorotic; purple, maturity; green, 
decorative; red, rugose; orange, lanceolate. Deletions belonging to Class 3 
are likely a result of FN mutagenesis and are highlighted by an asterisk (*). 
Deletions potentially impacting gene expression are highlighted with an E. 
Note the region of natural variation (SV Class 2) shared by three mutant 
plants on chromosome 2. 



DNA-seq permits genome analyses on a base pair scale. Using 
the approach described in the materials and methods we esti- 
mated that there are 32,499 SNPs and 20,363 INDELs shared by 
multiple FN lines, most likely representing the natural variation 
caused by genetic heterogeneity within the Red Hawk cultivar 



(Class 2). We also estimated 92,205 SNPS and 20,340 INDELs 
unique to a single FN line (Class 3). As described earlier, Class 3 
SNPs and INDELs are most likely a result of the FN mutagenesis. 
Both Class 2 and Class 3 SNPs and INDELs were mapped to the 
available P. vulgaris genome to determine whether the change in 
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Table 1 | Phaseolus vulgaris cv. Red Hawk fast neutron mutant population. 



Plant ID 


2010 Phenotype 


2011 Phenotype 


Mat 


M3 Seg 


Generation 


I nU4L lor4^rVIVIIN I I 


Erect growth 


Erect growth 


IN 


M 
IM 


IVIo 


I nUbUU I rbUrVIVIIN I I 


Stunted dwarf, sit chlorotic 


Stunted dwarf, sit chlorotic 


L 


M 
IM 


IVIo 


I nUDLUor DLr V IVIIN I I 


Stunted, delayed 


Stunted 


M 
IN 


M 
IM 


IVIo 


I nUoUUyr/UrVIVIN I I 


Large leaves, chlorotic 


WT pheno 


rz 


M 
IN 


l\ /I Q 

IVIo 


I nU /UU4roUr VIVMM I I 


Stunted, delayed 


Stunted, delayed 


L 


M 
IN 


IVIo 


I nU/L I I [ c?L>r V IVIIN I I 


Stunted, delayed 


Stunted, delayed 


L 


M 
IM 


|\/IO 

IVIO 


I nU/L 1 4T I I Ur V IVIIN I I 


Stunted, delayed 


Delayed growth 


I 

L 


M 
IM 


IVIo 


I nuyuu I r I oUrVIVIIN I I 


Few leaves, flowering 


WT pheno 


M 
In 


M 
IN 


IVIo 


I n lUUUor l4LrVMN I I 


Delayed growth 


Chlorotic 


M 
IN 


M 
IN 


IVIo 


1 R11 riKricrD\/KyiMii 
In I IL* lor lOLrVMN I I 


Lanceolate leaves 


Lanceolate leaves 


i 

L 


M 
IM 


IVIo 


I n loLUbr loLrVMN I I 


Few leaves, flowering 


WT pheno 


K I 

IN 


IN 


[\ yio 
IVIo 


I n I4L iZizULrVIVIIN I I 


Late maturity 


Small plant 


L 


1 :1 


Ft JIO 

IVIo 


I n I4LooTz I Ur VIVMM I I 


Light green leaves 


Light green leaves 


K I 

IN 


IM 


IVIo 


1R15C05r22CPVMN11 


Stunted 


Stunted 


L 


3:1 


M3 


1R15C10r23CPVMN11 


Few leaves 


WT pheno, note maturity 


E 


N 


M3 


1 R16C03r24CPVMN10 


Erect growth 


WT pheno 


N 


N 


M3 


1 R16C09r25CPVMN11 


Few leaves 


WT pheno 


N 


N 


M3 


1 R17C13r26CPVMN11 


Erect, large leaves 


Erect, large leaves 


N 


N 


M3 


1 n18C18r27CPVIvlN11 


Tall, bushy, erect 


\ A FT _ I 

Wl pheno 


N 


N 


M3 


1 R19C15r28CPVMN11 


Lanceolate leaves 


Lanceolate leaves, long petioles, 
some chlorosis 


L 


N 


M3 


1R20C06r29CPVMN11 


Few leaves, flowering 


WT pheno 


N 


N 


M3 


1 R20C08r30CPVMN11 


Delayed growth 


WT pheno, note maturity 


E 


N 


M3 


1 R22C04r31 CPVMN11 


Delayed growth 


Rugose, stunted 


L 


N 


M3 


1 R22C15r32CPVMN11 


Leaf size and shape 


Large leaves, long petioles 


N 


N 


M3 


1 R22C27r33CPVMN11 


Large leaves 


Bushy, lots of leaves 


N 


N 


M3 


1 R22C37r34CPVMN11 


Leaf size and shape 


WT pheno 


N 


N 


M3 


1R23C21r35CPVMN11 


Small leaves 


Lanceolate leaves, long petioles, 
some chlorosis 


N 


N 


M3 


I nZ4L I OTo/UrVIVMN I I 


Large leaves 


Large leaves 


t 


M 
IM 


(\ AO 
IVIo 


I nZDUZUioyL>rVIVIIN I I 


Lacks apical dominance, sprawling 


Lacks apical dominance, sprawling 


IN 


1 ■ R 
I .O 


IVIo 


I nzoUU/T4Ut^rVIVMM I I 


Bushy 


Bushy 


M 
IM 


M 
IN 


IVIo 


I nzbCz4r4 I LrVMIN I I 


Lanceolate leaves 


WT pheno, note maturity 


rz 


IN 


IVIo 


1 RocPocirynPDXM /im 1 1 
I nzbUzbr4ZUrVIVIIN I I 


Stunted 


Stunted 


l 

L 


M 

IN 


IVIo 


I nzyUUor4bLrVIVMM I I 


Bushy 


Bushy 


K I 

IN 


1 :1 


Ft JIO 

IVIo 


I noUUUzr4/LrVIVIlN I I 


Chlorotic 


Chlorotic 


i 

L 


Im 


Ft /n 

IVIo 


I noUCUbr4oLrVMlN I I 


Large leaves 


Large leaves 


M 


M 

IM 


!\ AO 

IVIo 


I no I U I OT4yUr VIVMM I I 


Few leaves, flowering 


WT pheno, note maturity 


rz 


Im 


IVIo 


I no I UzorbULrVIVIIN I I 


Few leaves, flowering 


Few Leaves 


M 

In 


M 

IM 


Ft AO 

IVIo 


1 nozUUbrb 1 LrVMIM 1 1 


Late maturity 


Late maturity 


i 

L 


Im 


IVIo 


1R32C11r52CPVMN11 


Short plant 


Short plant 


N 


N 


M3 


1 R32C17r53CPVMN11 


Bushy 


Bushy 


N 


N 


M3 


1 R33C17r54CPVMN11 


Large, lanceolate leaves 


Large, lanceolate leaves 


N 


N 


M3 


1R33C32r55CPVMN11 


Short, bushy and slightly rugose 


Short, bushy 


N 


N 


M3 


1R35C27r56CPVMN11 


Tall and bushy 


Bushy 


N 


1:1 


M3 


1 R37C06r57CPVMN11 


Bushy and leaf size varies 


Bushy 


L 


N 


M3 


1R37C19r58CPVMN11 


Many leaves fused to form 
unifoliate 


Long petioles 


N 


N 


M3 


1R41C06r60CPVMN11 


Rugose 


Fewer pods 


L 


N 


M3 


1R41C22r61CPVMN11 


Erect growth, light green chlorotic 
leaves 


Erect growth, light green chlorotic 
leaves 


N 


N 


M3 


1R42C02r62CPVMN11 


Small chlorotic plant 


Small chlorotic plant 


N 


1:2 


M3 



(Continued) 
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Table 1 | Continued 



Plant ID 


2010 Phenotype 


2011 Phenotype 


Mat 


M3 Seg 


Generation 


1R43C24r63CPVMN11 


Wavy curled leaves 


WT pheno, note maturity 


L 


N 


M3 


1R44C20r64CPVMN11 


Bushy, wavy and curled leaves 


Bushy, wavy and curled leaves 


E 


N 


M3 


2R14C01r66CPVMN11 


Stunted, cupped leaves and few 


Stunted, cupped leaves and few 


L 


N 


M3 




flowers 


flowers 








2R14C13r67CPVMN11 


Stunted, rugose curled leaves. 


WT pheno 


N 


N 


M3 




short petioles 










2R18C06r69CPVMN11 


r** i j. j. ■ iii _i 

Short, stunted, delayed 


Erect 


L 


2:1 


M3 


2R24C27r73CPVMN10 


Short compact plant 


Short compact plant 


N 


N 


M3 


2R25C11 r74CPVMN11 


Slight chlorotic 


Erect growth 


N 


N 


M3 


2R26C31r75CPVMN11 


Slight chlorotic 


Slight chlorotic 


N 


N 


M3 


2R27C18r76CPVMN11 


Viney, spindly, few leaves, large 


Viney, spindly, few leaves, large 


L 


N 


M3 




cupped leaves 


cupped leaves 








2R27C20r77CPVMN11 


Lanceolate leaves 


Bushy 


N 


N 


M3 


2R29C12r78CPVMN11 


Few large leaves 


WT pheno, note maturity 


L 


1 :2 


M3 


2R33C02r80CPVMN 1 1 


Lanceolate curled leaves, very few 


Curled leaves 


L 


N 


M3 




flowers, late maturity 










Z n4JLU / To ol^r V IVIIM I I 


Large leaves, some fused trifoliates 


Bushy 


M 


M 
IN 


Vlo 


z n^-oUooTo^-^r v iviim 1 1 


Large dark green rugose leaves 


WT pheno, note maturity 


| 

l_ 


M 
IN 


Vlo 


onU4UUyrooL>r VIVIIM I I 


Large leaves 


Long petioles, a bit bushy 


M 


M 
IN 


IVI o 


onux I oroD^r vivMM I I 


Late maturity 


Late maturity 


l_ 


M 
IN 


Vlo 


ODflcnc rO"7PD\ /l\ A M 1 1 

onUoCzoro/CrVMN I I 


Slight chlorotic 


Interveinal chlorosis 


i 

l_ 


M 

N 


I Vlo 


3R06C13r89CPVMN11 


Large cupped leaves 


Slightly curled leaves 


L 


N 


M3 


3R07C22r90CPVMN11 


Tall, erect growth 


fewer pods, note maturity 


L 


N 


M3 


3R11 C14r91 CPVMN11 


Large rugose leaves 


rugose, spindly, very few pods 


N 


N 


M3 


3R16C03r92CPVMN11 


Curled lanceolate leaves 


Curled lanceolate leaves 


L 


N 


M3 


3R17C22r93CPVMN11 


rugose, short petioles 


rugose, short petioles 


L 


N 


M3 


R06C05CPVMN11 


N/A 


Short, curled leaves, spotty 


N/A 


N/A 


M2 






chlorosis 








R09C05CPVMN11 


N/A 


Few lateral branches, erect growth 


N/A 


N/A 


M2 


R10C09CPVMN11 


N/A 


Spindly, small leaves 


N/A 


N/A 


M2 


R11C21CPVMN11 


N/A 


Large leaves, odd nodes, rugose, 


N/A 


N/A 


M2 






many small branches 








R13C09CPVMN11 


N/A 


Short, rugose curled leaves 


N/A 


N/A 


M2 


R15C12CPVMN11 


N/A 


Short, pointed leaves 


N/A 


N/A 


M2 


R16C12CPVMN11 


N/A 


Short, bushy, lacks apical 


N/A 


N/A 


M2 






dominance, many small branches, 












small leaves 








R19C22CPVMN11 


N/A 


Spotty chlorosis like row 5 in M3 
line 


N/A 


N/A 


M2 


R21 C05CPVMN11 


N/A 


Short petioles, large rugose curled 


N/A 


N/A 


M2 






leaves 








nZZLUOLrVMN I I 


M /A 

N/A 


Tall, very long petioles 


M /A 

N/A 


M /A 

N/A 


I VIZ 


nZ4LUbLrVMN I I 


M /A 

N/A 


Few lateral branches and flowers 


M /A 

N/A 


M /A 

N/A 


IVIZ 


R24C18CPVMN11 


N/A 


Stunted, small curled leaves 


N/A 


N/A 


M2 


R25C15CPVMN11 


N/A 


Very stunted mini-plant 


N/A 


N/A 


M2 


R28C12CPVMN11 


N/A 


Spindly, erect growth, small leaves 


N/A 


N/A 


M2 


R30C08CPVMN11 


N/A 


Stunted, few branches, no flowers 


N/A 


N/A 


M2 


R40C08CPVMN11 


N/A 


Stunted, small pointed leaves 


N/A 


N/A 


M2 


R47C24CPVMN11 


N/A 


Short, short petioles, small leaves 


N/A 


N/A 


M2 


R47C25CPVMN11 


N/A 


Short, chlorotic, pointed leaves 


N/A 


N/A 


M2 



Mat, Maturity; E, earlier than WT; N, normal; L, later than WT; M3 Seg, if M3 row planting is segregating ratio is noted. Generation, mutant generation seeds 
available. 
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Table 2 | Regions of natural structural variation identified within the cultivar. Red Hawk. 



Chr 


Deletion start 


Deletion stop 


Deletion size 


Mutants with 


Genes potentially 


Deletion relative to 




(bp) 


(bp) 


(bp) 


deletion 


affected 


gene 


L.nru I 


1 QOCO 1C/I 

1 oZbJ Ib4 


1 QOKQK7n 

I ozoJo/U 


4Ub 


LanceolatG, Chlorotic 






L.nru I 


1 Q 1 1 G~7KQ 

i y i iy /by 


1Q1 onQ7Q 

iy i juo / j 


11 11/1 
11,114 


LanceolatG, Chlorotic 






L.nru I 


1 1 A "7Dn"7Q 

Z 14/oU / y 


Z l4/obb 


J /b 


LanceolatG, Chlorotic 






L.nru I 


Z lbU4bUJ 


z i bU4/yz 


1 QQ 

i oy 


Lanceolate, Chlorotic 






L.nru I 


ZZ J /UZbb 


zzo /uzyb 


A 1 

4 I 


Lanceolate, Chlorotic 






L.nru I 


o/i ~7/i nco 1 
Z4/4UbZ I 


O yl "7/1 /i m Q 
Z4/44UZy 


q /i no 


Lanceolate, Chlorotic 






i^nru i 


zb/yzzju 


zb /yzzo/ 


b / 


Lanceolate, Chlorotic 






L.nru i 


Zbo / Jb/o 


Zboou/yo 


/,zzU 


Lanceolate, Chlorotic 






L.nru I 


QfY7~7QGQQ 


QmQm 1 o 
oU / oU I IZ 


1 "7/1 

I /4 


Lanceolate, Chlorotic 


□kin il nnif^iii"7nn 
rnVUI.UU I \j I I I /UU 


r 


L.nru I 


Jbyo /oUb 


obyjo 1 1 1 


oUb 


Lanceolate, Chlorotic 






Chr02 


no /I COOC7 

Zo4bzob / 


Zo4bZbUz 


245 


Lanceolate, Maturity, Chlorotic 






Chr02 


TO "7/1 yl Q~71 

zo /44o/z 


OQ~7/1 /1C1Q 

zo /44b I o 


646 


Lanceolate, Maturity, Chlorotic 






Chr02 


z4bUU/ZU 


z4b(J4y4o 


4,223 


Lanceolate, Maturity, Chlorotic 






Chr02 


1 A 1 1 0"7CC 

z4/ I o/bb 


Z4/ loaao 


1 28 


Lanceolate, Maturity, Chlorotic 






Chr02 


24728318 


24728663 


345 


Lanceolate, Maturity, Chlorotic 






Chr02 


Zo 1 JU4ZU 


zb loUb II) 


90 


Lanceolate, Maturity, Chlorotic 


rnvul.UUzb Izb IUU 


i 
1 


i^nruz 


ZOO 1 IUO 1 


QCCI Q"7QQ 

Zbb i J /yy 


1 ~IA~I 
Z, /4/ 


Lanceolate, Maturity, Chlorotic 






i^nruz 


zozzyzoy 


Zbzzy4 iu 


1/11 


Lanceolate, Maturity, Chlorotic 






Chr04 


3745720 


3746721 


1,001 


Matruity, Chlorotic 


Phvul.004G33700 


i 


Chr04 


4160322 


4160726 


404 


Matruity, Chlorotic 






Chr04 


4374139 


4386350 


12,111 


Matruity, Chlorotic 


Phvul.004G039400 
Phvul.004G039500 


p 

D 


Chr04 


4571494 


4575913 


4,419 


Matruity, Chlorotic 






Chr11 


5406621 


5407835 


1,214 


Lanceolate, Maturity, Chlorotic 






Chr11 


19639745 


19639812 


67 


Rugose, Maturity 







Genomic deletions identified by the program CHEST that belong to SV Class 2. All SV were visually verified using IGV. This class of SV represents regions of 
heterogeneity within Red Hawk. 

P Deletion is within WOO bp of start codon; I, Deletion is in an intron; D, Deletion is within 1000 bp of stop codon. 



genomic architecture corresponded to a genie region 
(Tables 4, 5). As was observed with the larger deletions, the 
majority of the INDELs and SNPs mapped to intergenic regions. 
Confirmation by PCR will be necessary to determine the false 
discovery rate of the SNP and INDEL identification. 

DISCUSSION 

We demonstrate the feasibility of utilizing high throughput 
DNA sequencing to analyze FN mutant plants. The use of 
high throughput sequencing allows the scale of our analysis 
to be reduced to a base-pair level, providing for the iden- 
tification and analysis of SNPs, indels, and larger genomic 
deletions using a single experimental platform. Our analysis 
identified SV in each Red Hawk individual (wild type and 
FN mutants) in comparison to the P. vulgaris (accession G 
19833) reference genome sequence (available at www.phytozome. 
net). These SV are inferred to be sequences lost (deleted) 
from the respective Red Hawk individuals or sequences recently 
gained by G 19833. (Identifying sequences missing in G 19833 
that are present in Red Hawk requires a de-novo assembly of 
the Red Hawk genome, which is beyond the scope of this 
analysis.) One may assume that the majority of the SV is 
either natural or FN-induced deletions in Red Hawk; therefore 



these events will be referred to as "deletions" throughout the 
discussion. 

Our analysis identified three classes of SV (Figure 2). Class 
1 represents the putative inter- cultivar SV, large sequence seg- 
ments that are missing from all sequenced Red Hawk individuals 
in this study, but are present in the current G 19833 assembly 
(Figure 2B). Class 2 represents the intra-cultivar SV exhibiting 
differences among Red Hawk individuals (Figure 2A). Class 3 
represents SV specific to a single Red Hawk individual, which 
were potentially generated by the FN irradiation (Figure 2C). 
We focused our analysis on the 23 Class 2 and 18 Class 3 SV 
that were identified in this analysis. However, it is important 
to recognize that these classifications are tentative, as a deeper 
sampling of Red Hawk individuals may re-classify some vari- 
ants. For example, the limited sampling of one wild-type and 
five mutated Red Hawk individuals suggests that some of our 
Class 3 SV may be low frequency natural variants that would 
have been identified in more than one Red Hawk individual (and 
thereby be a Class 2 SV) if a larger number of genotypes had been 
sampled. Similarly, some Class 1 SV may be present at low fre- 
quency in Red Hawk, suggesting that these would be re-classified 
as Class 2 SV in a deeper sampling of genotypes. It is probable 
that increasing the number of sequenced individuals from Red 
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Table 3 | Putative fast neutron induced genomic deletions. 



Chromosome 


Deletion start 


Deletion stop 


Deletion size 


Mutant with 


Genes potentially 


Deletion relative 




(bp) 


(bp) 


(bp) 


deletion 


affected 


to gene 


<^nru I 


I4/O0 Id 


1/Q1T/I7 

l4o Iz4/ 


O /I Q1 

z,4o I 


Lanceolate 






<^nru I 


ooo/yuu 


bbboUJ4 


1 o4 


Lanceolate 


rnVUI.UU 1 bUoU IUU 


i 
i 


<^nru I 


11Q 1 QO A Q 

zzo 1 oz4o 


T7Q1 OA A A 

zzo I o444 


I yo 


Decorative 






<^nru I 


ouo lUooO 


QCQ 1 nflO/l 

obo IUbz4 


QQ 

dy 


Chlorotic 






<^nru I 


ora /iof;n 
Jb44zb 1 / 


OD44UJUD 


1 C1Q 

I ,b I o 


Chlorotic 


rnVUI.UUlu IzobUU 


n 
U 


<^nru I 


obobbboU 


obbbbby I 


/] 1 

4 1 


Chlorotic 






<^nru4 


z /ozUJy 


T7QOQO./1 

z /ozzy4 


or-r 
ZOD 


Maturity 






<^nru4 


ZOy /oUiD 


zoyozUo 


A f\0 
4UJ 


Maturity 






<^nru4 


O I I ODDO 


q 1 1 cn~7n 
o I I du/u 


/i no 
4Uz 


Maturity 






<^nru4 


Q 1 C7QQC; 


Q 1 CQQ7Q 
O I DOO /O 


1 /l QQ 

1 ,4oo 


Maturity 


rnVUI.UU4uUzyzUU 


D 

r 


Chr04 


o4/4ooy 


oyioi qqq 
o4o I ooz 


6,543 


Maturity 






Chr04 


J4oy /41 


J4yUJo4 


613 


Maturity 


rhVUl.UU4bUJl oUU 


I 


Chr04 


3568924 


3569058 


134 


Maturity 


Phvul.004G031900 


D 












Phvul.004G032000 


D 


Chr04 


3679845 


3684349 


4,504 


Maturity 






Chr05 


25018705 


25018771 


66 


Lanceolate 






Chr07 


6354277 


6354449 


2,762 


Lanceolate 






Chr08 


24523869 


24566903 


43,034 


Chlorotic 


Phvul.008G141500 


G 


Chr09 


431942 


434704 


2,762 


Lanceolate 







Class 3 SV identified by the program CREST and visually verified using IGV. This SV class identifies sequences absent from one FN line, but present in all other 
samples. This class of SV is most likely a result of FN mutagenesis. 

P, Deletion is within WOO bp of start codon; I, Deletion is in an intron; D, Deletion is within 1000 bp of stop codon; G, whole gene deleted. 



Table 4 | Identification and classification of SNPs in the five mutant 
plants. 



Number of 


SV class 


SNPs in 


SNPs within 


mutants 




intergenic 


gene (promoter. 


with SNP 




regions 


exon, intron) 


SNPs unique to 1 plant 


3 


74,534 


17,761 


SNPs shared by 2 plants 


2 


14,515 


2,376 


SNPs shared by 3 plants 


2 


7,190 


1,073 


SNPs shared by 4 plants 


2 


4,159 


744 


SNPs shared by 5 plants 


2 


2,033 


409 



SNPs shared by more than one mutant plant (Class 2) represent natural variation 
existing in the Red Hawk cultivar. SNPs unique to a single mutant (Class 3) are 
likely caused by FN mutagenesis. SNPs within a gene region are more likely to 
impact gene expression patterns than those in intergenic regions. 



Hawk and/or G 19833 individuals would identify many additional 
Class 1 and Class 2 SV, and strengthen the confidence of the Class 
3 calls. 

A particularly interesting heterogeneous mixture of deletions 
was identified in a 2.7 million bp region on chromosome 2 of Red 
Hawk (Figure 3), and represents an intriguing cluster of Class 2 
SV. Sequence analysis of the 159 genes within this region revealed 
21 are involved in disease resistance response (one dirigent like 
protein, six leucine rich repeat proteins, and 14 NB-ARC-LRR 
domain containing disease resistance proteins). This is consis- 
tent with previous studies which identified an over-abundance of 



Table 5 | Identification and classification of INDELs in the five mutant 
plants. 



Number of 


SV class 


INDELs in 


Indels within a 


mutants 




intergenic 


gene (promoter. 


with INDEL 




regions 


exon, intron) 


INDELs unique to 1 plant 


3 


17,606 


2,704 


INDELs shared by 2 plants 


2 


8,042 


1,204 


INDELs shared by 3 plants 


2 


4,771 


740 


INDELs shared by 4 plants 


2 


2,877 


445 


INDELs shared by 5 plants 


2 


1,977 


271 



INDELs shared by more than one mutant plant likely represent natural vari- 
ation existing in the Red Hawk cultivar (Class 2) while INDELs unqiue to a 
single mutant (Class 3) are likely a result of FN mutagenesis. INDELs within 
a gene region are more likely to impact gene expression patterns than those in 
intergenic regions. 



disease resistance genes located within regions of natural varia- 
tion (McHale et al, 2012). Additional regions of Class 2 SV are 
apparent on chromosomes 1, 4, and 11. Analysis of the DNA-seq 
data for the five individual FN lines suggests the level of natu- 
ral variation present within the Red Hawk cultivar is higher than 
that induced by FN mutagenesis. However, aside from the region 
on chromosome 2, most of the Class 2 SV is in intergenic regions, 
not likely impacting gene expression or function. 

The Class 3 deletions are particularly interesting, as they may 
be associated with the mutant phenotypes found in the FN 
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irradiated individuals. For two of the FN lines, no unique dele- 
tions >40bp were found within or near gene coding regions 
(although it is still possible that these lines carry heterozygous 
deletions within genie regions). For the remaining three FN 
lines, however, we identified candidate deletions within gene- 
encoding regions that may be associated with the resulting 
phenotype. 

The lanceolate mutant (lR19C15r28CPVMNll) exhibited a 
shorter than wild type stature with elongated, or lanceolate, 
shaped leaves (Figure IB). CREST analysis identified a sin- 
gle Class 3 SV potentially altering the expression of Phvul. 
001G050100 (Table 3, Figure 3). The deletion is entirely con- 
tained within the last intron region of the gene. The gene is a 
member of the glycosyltransferase family 47 subgroup C. Proteins 
encoded by members of this family are bound to the golgi and are 
involved in cell wall biosynthesis (Jensen et al, 2008). Altering 
the expression of member of this family/subgroup would impact 
the physical property of the pectin matrix (Jensen et al., 2008). 
However, there are no reports of altered leaf phenotypes or plant 
height in Arabidopsis knockout populations. 

Three regions of SV belonging to Class 3 potentially affecting 
gene expression in the maturity mutant (2R29C12r78CPVMNl 1 ) 
were identified by CREST analysis. In the field, this plant showed 
no phenotypic derivation from the wild type, except a delay 
in maturity (Figure ID). These three deletions, all on chro- 
mosome four, potentially affect the expression pattern of four 
candidate genes (Figure 3, Table 3). The 1st deletion lies imme- 
diately upstream of Phvul.004G029200, which encodes a 60S 
ribosomal protein. This is an unlikely candidate gene for the 
phenotype observed. The second deletion is in the intron of 
Phvul.004G031300. Sequence analysis of this gene failed to iden- 
tify any functional annotations associated with this gene. The final 
deletion is immediately downstream of both Phvul.004G032000 
and Phvul. 004G03 1900. The Arabidopsis homologs of these 
genes (At3g01090 and At5gl9790) encode a protein kinase 
and an AP2 transcription factor respectively. AP2 transcription 
factors are involved in regulating flowering and fruit ripen- 
ing (Huijser and Schmid, 2011). In Arabidopsis, the over- 
expression of AP2 results in delayed flowering and maturity 
(Wollmann et al, 2010). It's possible, the deletion immedi- 
ately down-stream of Phvul.004G031900 causes an increase of 
AP2 protein accumulation resulting in delayed flowering and 
maturity. 

Finally, the mutant plant 3R05C25r87CPVMNll exhibited 
interveinal chlorosis under standard field conditions (Figure IE). 
Interveinal chlorosis is often an indicator of nutrient deficien- 
cies. However, DNA-seq analysis of this mutant revealed two 
SV belonging to Class 3 potentially affecting gene expression 
patterns (Table 3). The first is a 1518 bp deletion immedi- 
ately downstream of Phvul.001G128600. The homolog of this 
gene in Arabidopsis thaliana (At3gl0140) encodes a RecA pro- 
tein, involved DNA repair by binding ssDNA (Miller-Messmer 
et al, 2012). The second Class 3 SV is a large deletion spaning 
43,034 bp on chromosome 8, encompassing the entire sequence 
for Phvul.008G141500 (Table 3). Sequence analysis of this gene 
determined it is a member of the SNF2 helicase family. The 



Arabidopsis homolog of this gene, At2g44980, is the only mem- 
ber of the ALC1 SNF2 subfamily (Knizewski et al, 2008). There 
is no reported data on the phenotype of an ALC1 knock- 
out in Arabidopsis. However, in mammalian systems, ALC1 is 
essential for repairing DNA damage (Ryan and Hughes, 2011). 
Down-regulation of ALC1 protein results in hypersensitivity 
to damaging agents (Ryan and Hughes, 2011). We hypothe- 
size a similar function is conserved in common bean. Based 
on the interveinal chlorotic leaf patterning when this gene is 
completely excised it's possible Phvul.008G141500 is involved in 
repairing damage to DNA in the leaves, possibly caused by UV 
radiation. 

We identified Class 3 SV, likely a result of FN mutagenesis, 
ranging from 1 bp substitutions to 43,034 bp deletions, though 
changes <40bp have not been visually verified. In the mutated 
plants, the level of FN irradiation used (16 Gys) induced far 
more small (<50bp) deletions, including single base pair sub- 
stitutions and deletions, than large genomic deletions. These 
results are consistent with FN induced deletions identified in 
a recent paper in Arabidopsis (Belfield et al, 2012). The lack 
of large SV regions belonging to Class 3, as seen in the well- 
characterized soybean population (Bolon et al, 2011), may be 
due to our analysis pipeline requiring the deletion to be com- 
plete (i.e.,: homozygous). It is possible some larger deletions 
are present in these lines, but are not yet homozygous in the 
M2 generation and, as such, were not identified by our analy- 
sis. In the soybean FN population, 52 and 38% of the mutants 
with visual phenotypes were identified from seeds treated with 
16 and 32 Gys respectively (Bolon et al., 2011). None of the 
P. vulgaris plants irradiated at 32 Gys germinated in the field, 
suggesting the common bean genome may be less resilient to 
interruption than the soybean genome. The soybean genome 
may accommodate larger genomic deletions by compensating 
for gene loss through altered expression patterns of duplicated 
genes. 

SUMMARY 

A FN population with 88 individuals or bulk seed from M2 
and M3 generations is available upon request. We've illus- 
trated the utility of DNA-seq to identify three classes of SV 
in P. vulgaris individuals. These analyses were greatly facili- 
tated by the availability of the P. vulgaris genome sequence. 
In the Red Hawk cultivar, natural variation is clustered in 
regions throughout the genome. These regions of natural vari- 
ation illustrate the existing genetic potential of common bean 
germplasm. Our analyses also identified genomic deletions 
resulting from FN mutagenesis and candidate genes potentially 
responsible for the altered phenotype in three of the plants 
selected. 
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